About the Provider
Moonshot AI is a Chinese AI research company focused on building large-scale foundation models with advanced agentic and multimodal capabilities. Kimi K2 Thinking is their flagship open-weights reasoning model, the first open-source model to outperform leading closed-source models including GPT-5 and Claude 4.5 Sonnet across major benchmarks.Model Quickstart
This section helps you quickly get started with themoonshotai/Kimi-K2-Thinking model on the Qubrid AI inferencing platform.
To use this model, you need:
- A valid Qubrid API key
- Access to the Qubrid inference API
- Basic knowledge of making API requests in your preferred language
moonshotai/Kimi-K2-Thinking model and receive responses based on your input prompts.
Below are example placeholders showing how the model can be accessed using different programming environments.You can choose the one that best fits your workflow.
Model Overview
Kimi K2 Thinking is the first open-weights model to achieve SOTA performance against leading closed-source models including GPT-5 and Claude 4.5 Sonnet across major benchmarks — HLE (44.9%), BrowseComp (60.2%), and SWE-Bench Verified (71.3%).- Built on a 1T parameter MoE architecture with 32B active per token and native INT4 quantization via QAT, it runs at 2x the speed of FP8 deployments.
- The model maintains stable tool-use across 200–300 sequential calls within a 256K context window, with interleaved chain-of-thought and dynamic tool calling for complex agentic workflows.
Model at a Glance
| Feature | Details |
|---|---|
| Model ID | moonshotai/Kimi-K2-Thinking |
| Provider | Moonshot AI |
| Architecture | Sparse MoE Transformer — 1T total / 32B active per token, native INT4 via Quantization-Aware Training |
| Model Size | 1T Total / 32B Active |
| Parameters | 4 |
| Context Length | 256K Tokens |
| Release Date | 2025 |
| License | Apache 2.0 |
| Training Data | Large-scale multilingual dataset with RL post-training for agentic reasoning and tool-use |
When to use?
You should consider using Kimi K2 Thinking if:- You need complex agentic research workflows with multi-step tool orchestration
- Your application requires long-horizon coding and debugging
- You are solving advanced mathematical reasoning tasks
- Your use case involves autonomous writing and analysis
- You need a model that outperforms GPT-5 and Claude 4.5 Sonnet on open benchmarks
- Your workflow requires stable tool use across 200–300 sequential calls
Inference Parameters
| Parameter Name | Type | Default | Description |
|---|---|---|---|
| Streaming | boolean | true | Enable streaming responses for real-time output. |
| Temperature | number | 1 | Recommended temperature is 1.0 for Kimi-K2-Thinking. |
| Max Tokens | number | 16384 | Maximum number of tokens to generate. |
| Top P | number | 0.95 | Controls nucleus sampling. |
Key Features
- First Open-Source to Beat Closed Frontier Models: Achieves SOTA on HLE (44.9%), BrowseComp (60.2%), and SWE-Bench Verified (71.3%) — surpassing GPT-5 and Claude 4.5 Sonnet.
- Native INT4 via QAT: Quantization-Aware Training enables INT4 inference at 2x the speed of FP8 without accuracy loss.
- Stable Long-Horizon Tool Use: Maintains consistent tool-calling behaviour across 200–300 sequential calls within a single context window.
- Interleaved Chain-of-Thought: Dynamically interleaves reasoning traces with tool calls for interpretable agentic execution.
- 1T MoE Architecture: Frontier-scale capacity with only 32B parameters active per token for efficient inference.
- 256K Context Window: Supports long-horizon document analysis, multi-turn agentic tasks, and extended reasoning chains.
Summary
Kimi K2 Thinking is Moonshot AI’s flagship open-weights reasoning model and the first to surpass closed frontier models at open-source scale.- It uses a 1T MoE Transformer with 32B active parameters and native INT4 via QAT, running at 2x the speed of FP8 deployments.
- It achieves SOTA on HLE (44.9%), BrowseComp (60.2%), and SWE-Bench Verified (71.3%), outperforming GPT-5 and Claude 4.5 Sonnet.
- The model supports 256K context, stable 200–300 sequential tool calls, and interleaved chain-of-thought reasoning.
- Licensed under Apache 2.0 for full commercial use.